Skip to content

Conversation

@burakdede
Copy link
Contributor

Addresses improvement required #164

  • Add --follow to tower apps logs with SSE streaming, retry/backoff, Ctrl+C handling, dedupe across reconnects, and fatal‑error exits.
  • Surface log-stream warnings and harden reconnection behavior (follow up commits for cancels stale monitors, explicit terminal status handling...).
  • Add unit tests for parsing, terminal statuses, backoff, dedupe, and out‑of‑order logs.
  • Extend mock API to emit warning SSE events and add an integration scenario to validate warning visibility.

Tests should cover most of the critical aspects of the implementation but I have also simulated some tests via local tower build and deploying app for;

Drain Logs & Clean Exit

2026-01-15 19:10:03 | tick=48 time=2026-01-15T19:10:03.529504Z
2026-01-15 19:10:04 | tick=49 time=2026-01-15T19:10:04.529650Z
2026-01-15 19:10:05 | tick=50 time=2026-01-15T19:10:05.529810Z
2026-01-15 19:10:05 | WARN synthetic warning tick=50 time=2026-01-15T19:10:05.529810Z
2026-01-15 19:10:05 | ERROR synthetic error tick=50 time=2026-01-15T19:10:05.529810Z
2026-01-15 19:10:06 | tick=51 time=2026-01-15T19:10:06.529973Z
2026-01-15 19:10:07 | tick=52 time=2026-01-15T19:10:07.530149Z
2026-01-15 19:10:08 | tick=53 time=2026-01-15T19:10:08.530332Z
2026-01-15 19:10:09 | tick=54 time=2026-01-15T19:10:09.530482Z
2026-01-15 19:10:10 | tick=55 time=2026-01-15T19:10:10.530652Z
2026-01-15 19:10:10 | WARN synthetic warning tick=55 time=2026-01-15T19:10:10.530652Z
2026-01-15 19:10:11 | tick=56 time=2026-01-15T19:10:11.530797Z
Warning: No new logs available

Intermittent Failure Results & Slightly-Graceful Exit

2026-01-15 19:21:06 | WARN synthetic warning tick=85 time=2026-01-15T19:21:06.018685Z
2026-01-15 19:21:07 | tick=86 time=2026-01-15T19:21:07.018867Z
2026-01-15 19:21:08 | tick=87 time=2026-01-15T19:21:08.019028Z
2026-01-15 19:21:09 | tick=88 time=2026-01-15T19:21:09.019180Z
2026-01-15 19:21:10 | tick=89 time=2026-01-15T19:21:10.019344Z
2026-01-15 19:21:11 | tick=90 time=2026-01-15T19:21:11.020777Z
2026-01-15 19:21:11 | WARN synthetic warning tick=90 time=2026-01-15T19:21:11.020777Z
2026-01-15 19:21:11 | ERROR synthetic error tick=90 time=2026-01-15T19:21:11.020777Z
2026-01-15 19:21:11 | {"event":"heartbeat","tick":90,"time":"2026-01-15T19:21:11.020777Z"}
2026-01-15 19:21:12 | tick=91 time=2026-01-15T19:21:12.020962Z
2026-01-15 19:21:13 | tick=92 time=2026-01-15T19:21:13.021117Z
2026-01-15 19:21:14 | tick=93 time=2026-01-15T19:21:14.021280Z
2026-01-15 19:21:15 | tick=94 time=2026-01-15T19:21:15.021446Z
2026-01-15 19:21:16 | tick=95 time=2026-01-15T19:21:16.021620Z
2026-01-15 19:21:16 | WARN synthetic warning tick=95 time=2026-01-15T19:21:16.021620Z
2026-01-15 19:21:17 | tick=96 time=2026-01-15T19:21:17.021776Z
Oh no! Failed to monitor run completion after repeated errors
Oh no! The Tower CLI wasn't able to talk to the Tower API! Are you offline? Try again later.
Error: Fetching run details failed

Kill Log Stream & Clean Exit

2026-01-15 19:09:50 | WARN synthetic warning tick=35 time=2026-01-15T19:09:50.527412Z
2026-01-15 19:09:51 | tick=36 time=2026-01-15T19:09:51.527581Z
2026-01-15 19:09:52 | tick=37 time=2026-01-15T19:09:52.527723Z
2026-01-15 19:09:53 | tick=38 time=2026-01-15T19:09:53.527879Z
2026-01-15 19:09:54 | tick=39 time=2026-01-15T19:09:54.528027Z
2026-01-15 19:09:55 | tick=40 time=2026-01-15T19:09:55.528179Z
2026-01-15 19:09:55 | WARN synthetic warning tick=40 time=2026-01-15T19:09:55.528179Z
2026-01-15 19:09:55 | ERROR synthetic error tick=40 time=2026-01-15T19:09:55.528179Z
2026-01-15 19:09:56 | tick=41 time=2026-01-15T19:09:56.528346Z
2026-01-15 19:09:57 | tick=42 time=2026-01-15T19:09:57.528500Z
2026-01-15 19:09:58 | tick=43 time=2026-01-15T19:09:58.528661Z
2026-01-15 19:09:59 | tick=44 time=2026-01-15T19:09:59.528807Z
2026-01-15 19:10:00 | tick=45 time=2026-01-15T19:10:00.528944Z
2026-01-15 19:10:00 | WARN synthetic warning tick=45 time=2026-01-15T19:10:00.528944Z
2026-01-15 19:10:00 | {"event":"heartbeat","tick":45,"time":"2026-01-15T19:10:00.528944Z"}
2026-01-15 19:10:01 | tick=46 time=2026-01-15T19:10:01.529202Z
2026-01-15 19:10:02 | tick=47 time=2026-01-15T19:10:02.529351Z
2026-01-15 19:10:03 | tick=48 time=2026-01-15T19:10:03.529504Z
^CReceived Ctrl+C, stopping log streaming...
Note: The run will continue in Tower cloud
  See more: https://app.tower.dev/tastey-engine-877/default/apps/follow-logs-test-1768503576/runs/4

@bradhe
Copy link
Contributor

bradhe commented Jan 16, 2026

Nice thanks @burakdede, we should make the default branch in this repo develop -- mind if you rebase against that? We follow git flow more or less...really should be documented.

I'll review the content now.!

Copy link
Contributor

@bradhe bradhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the failure re: linting, this is generally really good! I'd really love to merge this with the following logic in the do_run function in crates/tower-cmd/src/run.rs -- but I can manage that on our side! All the stuff around retries, etc., is really good and missing from the other implementation.

I'll leave this open for a bit just in case you decide this is something you want to tackle, but otherwise I'll rebase this against develop and land this over the weekend!

Comment on lines +246 to +253
if is_run_finished(&run) {
if let Ok(resp) = api::describe_run_logs(&config, &name, seq).await {
for line in resp.log_lines {
emit_log_if_new(&line, &mut last_line_num);
}
}
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, we really should unify these endpoints at some point to make this easier 🙃

let (cancel_tx, cancel_rx) = oneshot::channel();
cancel_monitor = Some(cancel_tx);
let run_complete = monitor_run_completion(&config, &name, seq, cancel_rx);
match api::stream_run_logs(&config, &name, seq).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine functionally, but comparing it to our implementation for starting a remote run and following the logs, we wait for the run to start first...that may not be necessary, we should be able to just start listening? I'll double check that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants